Semi-supervised analysis of gene expression profiles for lineage-specific development in the Caenorhabditis elegans embryo
نویسندگان
چکیده
MOTIVATION Gene expression profiling is a powerful approach to identify genes that may be involved in a specific biological process on a global scale. For example, gene expression profiling of mutant animals that lack or contain an excess of certain cell types is a common way to identify genes that are important for the development and maintenance of given cell types. However, it is difficult for traditional computational methods, including unsupervised and supervised learning methods, to detect relevant genes from a large collection of expression profiles with high sensitivity and specificity. Unsupervised methods group similar gene expressions together while ignoring important prior biological knowledge. Supervised methods utilize training data from prior biological knowledge to classify gene expression. However, for many biological problems, little prior knowledge is available, which limits the prediction performance of most supervised methods. RESULTS We present a Bayesian semi-supervised learning method, called BGEN, that improves upon supervised and unsupervised methods by both capturing relevant expression profiles and using prior biological knowledge from literature and experimental validation. Unlike currently available semi-supervised learning methods, this new method trains a kernel classifier based on labeled and unlabeled gene expression examples. The semi-supervised trained classifier can then be used to efficiently classify the remaining genes in the dataset. Moreover, we model the confidence of microarray probes and probabilistically combine multiple probe predictions into gene predictions. We apply BGEN to identify genes involved in the development of a specific cell lineage in the C. elegans embryo, and to further identify the tissues in which these genes are enriched. Compared to K-means clustering and SVM classification, BGEN achieves higher sensitivity and specificity. We confirm certain predictions by biological experiments. AVAILABILITY The results are available at http://www.csail.mit.edu/~alanqi/projects/BGEN.html.
منابع مشابه
Network inference of pal-1 lineage-specific regulation in the C. elegans embryo by structural equation modeling
The elucidation of spatial and temporal control during developmental stages is one of the central tasks for systems biology, and a variety of intracellular factors are known as regulators for specific gene expression. The activity information of those various factors is not directly reflected in their gene expression profiles. Hence, a method based on Structural Equation Modeling (SEM) is descr...
متن کاملTocotrienol Modulates the Expression of Proteins in Oxidative Stress-Induced Caenorhabditis Elegans
Objective: Oxidative stress that damages proteins result in aging and age related diseases. The aim of this study is to determine the effect of tocotrienol rich fraction (TRF) on the expression of proteins in oxidative stress-induced caenohabditis elegans (C.elegans) which has homologous genes to humans. Methods: The worms were treated with TRF prior to, after and continuously in separate group...
متن کاملIdentification of lineage-specific zygotic transcripts in early Caenorhabditis elegans embryos.
During Caenorhabditis elegans embryogenesis, a maternally supplied transcription factor, SKN-1, is required for the specification of the mesendodermal precursor, EMS, in the 4-cell stage embryo. When EMS divides, it gives rise to a mesoderm-restricted precursor, MS, and an endoderm-restricted precursor, E. To systematically identify genes that function as key regulators of MS and/or E-derived t...
متن کاملA High-Fidelity Cell Lineage Tracing Method for Obtaining Systematic Spatiotemporal Gene Expression Patterns in Caenorhabditis elegans
Advances in microscopy and fluorescent reporters have allowed us to detect the onset of gene expression on a cell-by-cell basis in a systemic fashion. This information, however, is often encoded in large repositories of images, and developing ways to extract this spatiotemporal expression data is a difficult problem that often uses complex domain-specific methods for each individual data set. W...
متن کاملAutomated lineage and expression profiling in live Caenorhabditis elegans embryos.
Describing gene expression during animal development requires a way to quantitatively measure expression levels with cellular resolution and to describe how expression changes with time. Fluorescent protein reporters make it possible to measure expression dynamics in live cells by time-lapse microscopy, but it can be challenging to identify expressing cells in complex tissues and to compare exp...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 22 14 شماره
صفحات -
تاریخ انتشار 2006